Recovery in distributed systems using optimistic message logging and checkpointing
نویسندگان
چکیده
منابع مشابه
Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing
In a distributed system using message logging and checkpointing to provide fault tolerance there is always a unique maximum recoverable system state regardless of the message logging protocol used The proof of this relies on the observation that the set of system states that have occurred during any single execution of a system forms a lattice with the sets of consistent and recoverable system ...
متن کاملOptimistic Message Logging for Independent Checkpointing in Message-Passing Systems
Message-passing systems with communication protocol transparent to the applications typically require message logging to ensure consistency between checkpoints. This paper describes a periodic independent checkpointing scheme with optimistic logging to reduce performance degradation during normal execution while keeping the recovery cost acceptable. Both time and space overhead for message logg...
متن کاملOutput Driven Distributed Optimistic Message Logging and Checkpointing
Although optimistic fault tolerance methods using message logging and checkpointing have the potential to provide highly e cient transparent fault tolerance in distributed systems existing methods are limited by several factors Coordinating the asynchronous message logging progress among all processes of the system may cause signi cant over head limiting their ability to scale to large systems ...
متن کاملAn optimistic checkpointing and message logging approach for consistent global checkpoint collection in distributed systems
Checkpointing and rollback recovery are widely used techniques for achieving fault-tolerance in distributed systems. In this paper, we present a novel checkpointing algorithm which has the following desirable features: A process can independently initiate consistent global checkpointing by saving its current state, called a tentative checkpoint. Other processes come to know about a consistent g...
متن کاملDistributed System Fault Tolerance Using Message Logging and Checkpointing
Fault tolerance can allow processes executing in a computer system to survive failures within the system This thesis addresses the theory and practice of transparent fault tolerance methods using message logging and checkpointing in distributed systems A general model for reasoning about the behavior and correctness of these methods is developed and the design implementation and performance of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Algorithms
سال: 1990
ISSN: 0196-6774
DOI: 10.1016/0196-6774(90)90022-7